首页> 外文OA文献 >Private Record Linkage: A Comparison of Selected Techniques for Name Matching
【2h】

Private Record Linkage: A Comparison of Selected Techniques for Name Matching

机译:私人记录链接:几种名称匹配技术的比较

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The rise of Big Data Analytics has shown the utility of analyzing all aspects of a problem by bringing together disparate data sets. Efficient and accurate private record linkage algorithms are necessary to achieve this. However, records are often linked based on personally identifiable information, and protecting the privacy of individuals is critical. This work contributes to this field by studying an important component of the private record linkage problem: linking based on names while keeping those names encrypted, both on disk and in memory. We explore the applicability, accuracy, speed and security of three different primary approaches to this problem (along with several variations) and compare the results to common name-matching metrics on unprotected data. While these approaches are not new, this work provides a thorough analysis on a range of datasets containing systematically introduced flaws common to name-based data entry, such as typographical errors, optical character recognition errors, and phonetic errors. Additionally, we evaluate the privacy level of the q-grams based metrics by simulating the frequency analysis attack that can occur in case of potential data breaches. We show that, for the use case we are considering, the best choice of string metric are padded q-gram based metrics which can provide high record linkage accuracy and are resilient to frequency analysis attack under certain conditions.
机译:大数据分析的兴起显示了通过汇总不同的数据集来分析问题的各个方面的实用性。要做到这一点,必须有高效而准确的私人记录链接算法。但是,记录通常基于个人身份信息进行链接,因此保护个人隐私至关重要。这项工作通过研究私有记录链接问题的一个重要组成部分为该领域做出了贡献:基于名称的链接,同时在磁盘和内存中对这些名称进行加密。我们探索了三种不同的主要方法(以及几种变体)针对此问题的适用性,准确性,速度和安全性,并将结果与​​未受保护的数据的通用名称匹配指标进行了比较。尽管这些方法不是什么新方法,但是这项工作可以对一系列数据集进行全面分析,这些数据集包含系统引入的基于名称的数据输入常见的缺陷,例如印刷错误,光学字符识别错误和语音错误。此外,我们通过模拟在可能发生数据泄露的情况下可能发生的频率分析攻击,来评估基于q-grams指标的隐私级别。我们表明,对于我们正在考虑的用例,字符串度量的最佳选择是基于填充q-gram的度量,它可以提供较高的记录链接精度,并且在某些条件下可以抵抗频率分析攻击。

著录项

  • 作者

    Grzebala, Pawel B.;

  • 作者单位
  • 年度 2016
  • 总页数
  • 原文格式 PDF
  • 正文语种
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号